Search CORE

6 research outputs found

Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism

Author: Emami Mahyar
Kamahori Keisuke
Kashani Sahand
Larus James R.
Pourghannad Mohammad Sepehr
Raj Ritik
Publication venue
Publication date: 23/01/2023
Field of study

The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware heavily uses cycle-accurate simulation of register-transfer-level (RTL) designs. The best software RTL simulators can simulate designs at 1--1000~kHz, i.e., more than three orders of magnitude slower than hardware. Faster simulation can increase productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to use parallelism as RTL exposes considerable fine-grain concurrency. However, state-of-the-art RTL simulators generally perform best when single-threaded since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate runtime synchronization barriers among many simple processors. Manticore relies entirely on its compiler to schedule resources and communication. Because RTL code is practically free of long divergent execution paths, static scheduling is feasible. Communication and synchronization no longer incur runtime overhead, enabling efficient fine-grain parallelism. Moreover, static scheduling dramatically simplifies the physical implementation, significantly increasing the potential parallelism on a chip. Our 225-core FPGA prototype running at 475 MHz outperforms a state-of-the-art RTL simulator on an Intel Xeon processor running at

\approx

3.3 GHz by up to 27.9

\times

(geomean 5.3

\times

) in nine Verilog benchmarks

arXiv.org e-Print Archive

Auto-Partitioning Heterogeneous Task-Parallel Programs with StreamBlocks

Author: Bezati Endri
Emami Mahyar
Janneck Jörn W.
Larus James R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/10/2022
Field of study

FPGAs play an increasing role in the reconfgurable accelerator landscape. A key challenge in designing FPGA-based systems is partitioning computation between processor cores and FPGAs. An appropriate division of labor is difcult to predict in advance and requires experiments and measurements. When an investigation requires rewriting part of the system in a new language or with a new programming model, its high cost can delay design-space exploration. A single-language system with an appropriate programming model and compiler that targets both platforms transforms this tedious exploration to a simple recompile with new compiler directives. This work introduces StreamBlocks, a unifed open-source software/FPGA compiler and runtime that takes dataflow programs written in Cal, and automatically partitions them across heterogeneous CPU/FPGA platforms. The explicit task-parallel semantics of dataflow allows our compiler to simultaneously take advantage of thread parallelism on software and spatial parallelism on hardware. StreamBlocks is augmented with a profle-guided autopartitioning tool that helps identify the best hardware-software partitions. We demonstrate the capability of our compiler in fnding the right balance between hardware and software execution on both a high-end datacenter accelerator card and an embedded board. Our experiments exhibit a 4-7× speedup over trivial partitions. This speedup is achieved automatically with zero code modifcations

Lund University Publications

A CMOS CURRENT-MODE LOW POWER RMS-TO-DC CONVERTER

Author: AREKHI ABDOLGHANI
DANESH MOHAMMAD HADI
DEHDAST MAHYAR
FARD AMIN EMAMI
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 10/09/2020
Field of study

In this paper a low-power current-mode RMS-to-DC converter is proposed. The converter includes two-quadrant squarer/divider and the first-order low-pass filter cell, both of them use MOS translinear loops. The RMS-to-DC converter has low power consumption (\u3c 0.75μW), low supply voltage (0.8 V), wide input range (from 40 nA to 500 nA), low relative error (\u3c 3 %), and low circuit complexity. Comparing the proposed circuit with two other current-mode circuits shows that the former outperforms the latters in terms of power dissipation, supply voltage, and complexity. Simulation results by HSPICE show high performance of the circuit and confirm the validity of the proposed design technique

Interscience Research Network

Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm

Author: Azimi Seyedmajid
Emami Hassan
Ghafoori Mahyar
Hajiabadi Mohamadreza
Savareh Behrouz Alizadeh
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2018
Field of study

Purpose: Manual brain tumor segmentation is a challenging task that requires the use of machine learning techniques. One of the machine learning techniques that has been given much attention is the convolutional neural network (CNN). The performance of the CNN can be enhanced by combining other data analysis tools such as wavelet transform. Materials and methods: In this study, one of the famous implementations of CNN, a fully convolutional network (FCN), was used in brain tumor segmentation and its architecture was enhanced by wavelet transform. In this combination, a wavelet transform was used as a complementary and enhancing tool for CNN in brain tumor segmentation. Results: Comparing the performance of basic FCN architecture against the wavelet-enhanced form revealed a remarkable superiority of enhanced architecture in brain tumor segmentation tasks. Conclusion: Using mathematical functions and enhancing tools such as wavelet transform and other mathematical functions can improve the performance of CNN in any image processing task such as segmentation and classification

Institute of Transport Research:Publications

eprints Iran University of Medical Sciences

Polypyrrole/multiwall carbon nanotube nanocomposites electropolymerized on copper substrate

Author: Hamed Arami
Mahyar Mazloumi
Razieh Khalifehzadeh
S K Sadrnezhaad
Shahriar Hojjati Emami
Publication venue
Publication date: 11/04/2020
Field of study

Abstract Polypyrrole/multiwall carbon nanotube (PPy/MWCNT) nanocomposites were successfully synthesized by electropolymerization of MWCNTdispersed pyrrole solution on the surface of copper electrodes. The obtained nanocomposites were characterized with scanning electron microscopy (SEM), linear sweep voltammetry (LSV) and thermal gravimetric analysis (TGA). Polypyrrole structures which embraced the MWCNTs led to the formation of nanocomposite striated parallel walls. MWCNTs acted as appropriate substrates for electrodeposition of polypyrrole particulate structures and high yield synthesis of PPy was observed on them. Smooth PPy/MWCNT nanocomposite films were obtained on Cu electrodes by decreasing the potential scan rate. Thermogravimetric analysis showed that MWCNTs increased the thermal stability of polypyrrole

CiteSeerX

Wavelet-enhanced convolutional neural network: a new idea in a deep learning paradigm

Author: Behrouz Alizadeh Savareh
Hassan Emami
Mahyar Ghafoori
Mohamadreza Hajiabadi
Seyed Majid Azimi
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

Crossref